Conference Proceedings

Tailoring the Shapley Value for In-Context Example Selection Towards Data Wrangling

Z Liang, H Wang, X Ding, Z Liang, C Liang, Y Tang, J Qi

Proceedings International Conference on Data Engineering | IEEE | Published : 2025

Abstract

Data wrangling (DW) is a fundamental step to prepare data for downstream mining tasks. Recent studies explore large language models (LLMs) to form a lightweight DW paradigm. Such studies typically require prompting an LLM with a DW task together with a few examples as task demonstrations (i.e., in-context learning). A problem yet to be explored is how to select the examples, to maximize task effectiveness given constraints on the size of the examples. To fill this gap, we introduce the constrained Shapley value (CSV), a tailored variant of the Shapley value with a constraint on the LLM prompt size, to guide example selection. We show that CSV has desirable properties in example importance es..

View full abstract

University of Melbourne Researchers